On compressing frequent patterns
نویسندگان
چکیده
A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure δ (called δ-cluster), and select a representative pattern for each cluster. The problem of finding a minimum set of representative patterns is shown NP-Hard. We develop two greedy methods, RPglobal and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose[10], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.
منابع مشابه
Mining Compressing Patterns in a Data Stream
Mining patterns that compress the data well was shown to be an effective approach for extracting meaningful patterns and solving the redundancy issue in frequent pattern mining. Most of the existing works in the literature consider mining compressing patterns from a static database of itemsets or sequences. These approaches require multiple passes through the data and do not scale up with the s...
متن کاملEfficient Associating Mining Approaches for Compressing Incrementally Updatable Native XML Databases
XML-based applications widely apply to data exchange in EC and digital archives. However, the study of compressing Native XML databases has been surprisingly neglected, especially for the huge amount of data and the rapidly updatable database. These two factors give rise to our interest, and motivate us to develop an approach to efficiently compress native XML databases and dynamically maintain...
متن کاملOn compressing frequent patterns q
A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure d (called d-cluster), and select a representative pattern for each cluster. The problem of finding a minimum set of representative patterns is shown NP-Hard. We develop two greedy methods, RPglobal and RPlocal. ...
متن کاملMining Compressed Repetitive Gapped Sequential Patterns Efficiently
Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, in many problem domains (e.g, program execution traces), a novel sequential pattern mining research, called mining repetitive gapped sequential patterns, has attracted the attention of m...
متن کاملA Framework for Exploring the Frequent Patterns based on Activities Sequence
In recent years, the development of the use of location-based tools has made it possible to produce geometric trajectories from the user's movement paths. In this way, users' goal of traveling and related activities can be considered in addition to the geometry and route shape. the user activity trajectory represents the sequence of the visited activities and its related analysis as presented i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Data Knowl. Eng.
دوره 60 شماره
صفحات -
تاریخ انتشار 2007